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A powerful and well-established tool for free-energy estimation is Bennett's acceptance ratio 
method. Central properties of this estimator, which employs samples of work values of a forward 
and its time reversed process, are known: for given sets of measured work values, it results in the 
best estimate of the free-energy difference in the large sample limit. Here we state and prove a 
further characteristic of the acceptance ratio method: the convexity of its mean square error. As a 
two-sided estimator, it depends on the ratio of the numbers of forward and reverse work values used. 
Convexity of its mean square error immediately implies that there exists an unique optimal ratio 
for which the error becomes minimal. Further, it yields insight into the relation of the acceptance 
ratio method and estimators based on the Jarzynski equation. As an application, we study the 
performance of a dynamic strategy of sampling forward and reverse work values. 
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I. INTRODUCTION 

A quantity of central interest in thermodynamics and 
statistical physics is the (Hclmholtz) free-energy, as it 
determines the equilibrium properties of the system un- 
der consideration. In practical applications, e.g. drug 
design, molecular association, thermodynamic stability, 
and binding affinity, it is usually sufficient to know 
free-energy differences. As recent progress in statistical 
physics has shown, free-energy differences, which refer to 
equilibrium, can be determined via non-equilibrium pro- 
cesses [l|, 0. 

Typically, free-energy differences are beyond the scope 
of analytic computations and one needs to measure them 
experimentally or compute them numerically. Highly 
efficient methods have been developed in order to esti- 
mate free-energy differences precisely, including thermo- 
dynamic integration yL m, free-energy perturbation Q, 
umbrella sampling fmJB , adiabatic switching Q , dy- 
namic methods [10l.llll.ll2l]. asymptotics of work distribu- 
tions [13| , optimal protocols UJM, ta rgeted and escorted 
free-energy perturbatio n fl5l . UM . fl7l. [H, [Hj]. 

A powerful @, EH El and frequently [H El El 
used method for free-energy determination is two-sided 
estimation, i.e. Bennett's acceptance ratio method [26| . 
which employs a sample of work values of a driven 
nonequilibrium process together with a sample of work 
values of the time-reversed process (27j . 

The performance of two-sided free-energy estimation 
depends on the ratio 



of the number of forward and reverse work values used. 
Think of an experimenter who wishes to estimate the 
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free-energy difference with Bennett's acceptance ratio 
method and has the possibility to generate forward as 
well as reverse work values. The capabilities of the ex- 
periment give rise to an obvious question: if the total 
amount of draws is intended to be N = no + ni, which is 
the optimal choice of partitioning N into the numbers Hq 
of forward and n\ of reverse work values, or equivalently, 
what is the optimal choice r of the ratio r? The prob- 
lem is to determine the value of r that minimizes the 
(asymptotic) mean square error of Bennett's estimator 
when N = no + n± is held constant. 

While known since Bennett [III, the optimal ratio is 
underutilized in the literature. Bennett himself proposed 
to use a suboptimal equal time strategy, instead, because 
his estimator for the optimal ratio converges too slowly in 
order to be practicable. Even questions as fundamental 
as the existence and uniqueness are unanswered in the lit- 
erature. Moreover, it is not always clear a priori whether 
two-sided free-energy estimation is better than one-sided 
exponential work averaging. For instance, Shirts et al. 
have presented a physical example where it is optimal to 
draw work values from only one direction [28[ . 

The paper is organized as follows: in Sees. ITT1 and ITTT1 
we rederive two-sided free-energy estimation and the op- 
timal ratio. We also remind that two-sided estimation 
comprises one-sided exponential work averaging as limit- 
ing cases for lnr — > ±oo, a result that is also true for the 
mean square errors of the corresponding estimators. 

The central result is stated in Sec. HVt the asymptotic 
mean square error of two-sided estimation is convex in the 
fraction ^ of forward work values used. This fundamen- 
tal characteristic immediately implies that the optimal 
ratio r a exists and is unique. Moreover, it explains the 
generic superiority of two-sided estimation if compared 
with one-sided, as found in many applications. 

To overcome the slow convergence of Bennett's esti- 
mator of the optimal ratio, which is based on estimating 
second moments, in Sec. |V] we transform the problem 
into another form such that the corresponding estimator 
is entirely based on first moments, which enhances the 
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FIG. 1: (Color online) The overlap density p a (w) bridges 
the densities po(w) and pi(w) of forward and reverse work 
values, respectively, a is the fraction — ^ — of forward work 
values, here schematically shown for a = 0.0001, a = 0.5, and 
a — 0.9999 . The accuracy of two-sided free-energy estimates 
depends on how good p a (w) is sampled when drawing from 
Po(w) and pi(w). 



convergence enormously. 

As an application, in Sec. IVIII we present a dynamic 
strategy of sampling forward and reverse work values that 
maximizes the efficiency of two-sided free-energy estima- 
tion. 



II. TWO-SIDED FREE-ENERGY ESTIMATION 

Given a pair of samples of no forward and n\ reverse 
work values drawn from the probability densities po(w) 
and pi(w) of forward and reverse work values and pro- 
vided the latter are related to each other via the fluctu- 
ation theorem (2j, 



Po(w) 
pi(w) 



(2) 



Bennett's acceptance ratio method (2fJ, |2g, l27|, |29( is 
known to give the optimal estimate of the free-energy dif- 
ference A/ in the limit of large sample sizes. Throughout 
the paper, A/ = AF/kT and w — W/kT are understood 
to be measured in units of the thermal energy kT. The 
normalized probability densities po(w) emdpi(w) are as- 
sumed to have the same support fl, and we choose the 
following sign convention: po(w) := Pforward(+w) and 
Pi(w) := p IeveIae (-w). 



Now define a normalized density p a (w) with 



Pa{w) = 



1 



Po(w)pi(w) 



U a ap (w) + f3pi(w) ' 
w G O, where a G [0, 1] is a real number and 

-0 = 1. 



a 



The normalization constant U a is given by 



PoPi 



ap Q + (3pi 



-dw. 



(3) 



(4) 



(5) 



The density p a (w) is a normalized harmonic mean of po 

-l 



and pi , 



ap +/3pi 



Pl ^ Po 



, and thus bridges be- 



tween po and pi, see Fig. [TJ In the limit a — > 0, p a (w) 
converges to the forward work density po(w), and con- 
versely for a — * 1 it converges to the reverse density 
Pi(w). As a consequence of the inequality of the har- 



monic and arithmetic mean 
U a is bounded from above by unity, 



ai + 0±- 

Pl ' Po 



U a < 1 



< apx+fipo, 



(6) 



Va G [0,1]. Except for a = and a = 1, the equal- 
ity holds if and only if p = p\. Using the fluctuation 
theorem ([2J , U a can be written as an average in po and 
Pi, 



U a 



1 



a + f3e- w+A f / 1 \0 + ae w - A f 



1 



(7) 



where the angular brackets with subscript 7 G [0, 1] de- 
note an ensemble average with respect to p 7 , i.e. 



(g\ = / g{w)p 1 {w)dw 



(8) 



for an arbitrary function g(w). 

In setting a = 1 , Eq. ([7]) reduces to the nonequilibrium 
work relation [l[ 



1 = <e- + n 



(9) 



in the forward direction, and conversely with a = we 
obtain the nonequilibrium work relation in the reverse 
direction, 



1 



(e W - Af \ 



(10) 



The last two relations can, of course, be obtained more 
directly from the fluctuation theorem |2j . An important 
application of these relations is the one-sided free-energy 
estimation: Given a sample {wj . . . w^} of N forward 
work values drawn from po, Eq. ^ is commonly used to 

define the forward estimate A/ of A/ with 



A/ 



1 N 

inly, 

N ^ 

k=l 



(11) 



Conversely, given a sample {w{ . . . w^} of N reverse work 
values drawn from pi , Eq. (fT0|) suggests the definition of 

the reverse estimate Af 1 of A/, 



1 



A' 



(12) 



If we have drawn both, a sample of no forward and a 
sample of ni reverse work values, then Eq. ([7]) can serve 
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us to define a two-sided estimate A/ of A/ by replacing 
the ensemble averages with sample averages: 



1 ni 1 

— Y — - 



(13) 



A/ is understood to be the unique root of Eq. (fl~3)) , which 
exists for any a E [0, 1]. Different values of a result in 
different estimates for A/. Choosing 



a 



no 
N ' 



= 



ni 

iV" 



(14) 



N = no + ni, the estimate (fT3"|) coincides with Bennett's 
optimal estimate, which defines the two-sided estimate 
with least asymptotic mean square error for a given value 
a = Jmm or equivalently, for a given ratio r = ^ = ^ 
[20l. 1261]. We denote the optimal two-sided estimate, i.e. 
the solution of Eq. (fT3|) under the constraint (fT4)l . by 

A/ 1 _ Q and simply refer to it as the two-sided estimate. 
Note that the optimal estimator can be written in the 
familiar form 



two-sided estimate can be checked with the convergence 
measure introduced in Ref. 19]. 

In the limits a = ^ — > 1 and a — > 0, respectively, 
the asymptotic mean square error X of the two-sided 
estimator converges to the asymptotic mean square error 
of the appropriate one-sided estimator [30l ]. 



hm X(N, a) = - Var ^ = - Var ( e -™ +A/ ) (18) 
a->l Jy \po J JSI 



and 



teaX{N t a) = 1 Va ri (gJ = 1 Var^"^) , (19) 

where Var 7 denotes the variance operator with respect 
to the density p 7 , i.e. 



Var. 



(9) = ((.9-(5) 7 ) 1 



(20) 



for an arbitrary function g(w) and 7 € [0, 1] 



E 



1 



E 



1 



til + e < T JT "o fc= il + e k 1 "o 



(15) 



In the limit a = 77- 



1 the two-sided estimate reduces 

1 



to the one-sided forward estimate (fTTj) . Af 1 _ a "^i A/ , 

and conversely Af 1 _ a A/ x . Thus the one-sided es- 
timates are the optimal estimates if we have given draws 
from only one of the densities po or pi . 

A characteristic quantity to express the performance 
of the estimate A/ 1 _ Q is the mean square error, 



(A/ X _ a - A/ 



(16) 



which depends on the total sample size N = n + n\ and 
the fraction a = Here, the average is understood to 
be an ensemble average in the value distribution of the 
estimate A/ 1 _ Q for fixed N and a. In the limit of large 
no and m, the asymptotic mean square error X (which 
then equals the variance) can be written [2(1 12r3 ] 



X(N,a) 



1 1 



iVa/3 V^a 



1 



- 1 



(17) 



Provided the r.h.s. of Eq. (fTTj) exists, which is guaran- 
teed for any a € (0, 1), the ^-dependence of X is simply 
given by the usual -^--factor, whereas the a-dependence 
is determined by the function U a given in Eq. ([5]). Note 

that if a two-sided estimate A f 1 _ a is calculated, then es- 
sentially the normalizing constant U a is estimated from 
two sides, and 1, cf. Eqs. ([7]) and (fT3"|) . With an es- 
timate A/ 1 _ Q we therefore always have an estimate of 
the mean square error at hand. However, the reliability 
of the latter naturally depends on the degree of conver- 
gence of the estimate A/ 1 _ a . The convergence of the 



III. THE OPTIMAL RATIO 

Now we focus on the question raised in the introduc- 
tion: Which value a of a in the range [0, 1] minimizes 
the mean square error (|17[) when the total sample size, 
N = n + ni, is held fixed? 

Let M be the rescaled asymptotic mean square error 
given by 



M(a) =N-X(N,a), 



(21) 



which is a function of a only. Assuming a £ (0,1), 
a necessary condition for a minimum of M is that the 
derivative M'(a) = =p- of M vanishes at a . Before 
calculating M' explicitly, it is beneficial to rewrite M by 
using the identity 

U a = I E2Eli^±£Pll dw 



(apo + /3pi) 



■0> 



pi 



(apo + (3pif / 1 \(apo+ f3pi) 2 f Q 



(22) 



Subtracting (a + (3)U^ = U% from Eq. (|2"2")l and recalling 
the definition (J3]) of p a , one obtains 

U a {l-U a ) = [081(a) +0O o (a)]U*, (23) 

where the functions 9i are defined as 

9l(o) = v^)=L V a tI (- T5; L__), 

eo(o ) = Va t „(^)=L v „ ( ?T -L rz7 ). (24) 
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9q and 9\ describe the relative fluctuations of the quanti- 
ties that are averaged in the two-sided estimation of A/, 
cf. Eq. (H3J. 

With the use of formula (j2"3"|) . M can be written 



(25) 



M(a) = + — — 

a 



and the derivative yields 

M / (a) = gjW M") , gg^±<M , (26) 



/? 2 



a/3 



The derivatives of the ^-functions involve the first two 
derivatives of U a , which will thus be computed first: 



d( x J (ap +/3p 1 y 



(27) 



and 



U" 



da 2 



U a = 2 



PoPi (Pi -PoY 
{apo + 0p{f 



dw. (28) 



From this equation it is clear that U a is convex in a, 
U' a ' > 0, with a unique minimum in (0, 1) (as Uq = U± = 
1). We can rewrite the ^-functions with U a and U' a as 
follows: 



Ua - j3U' a 

h[OL) = — 1, 



U' 2 



)(«) 



Ua + €dJ' a 1 



(29) 



Differentiating these expressions gives 



^) = -jk{Ku a -2U a x ) 1 

U a 

8' (*) = ^{U a 'U a -2U a 2 ). 



(30) 



#o and 9\ are monotonically increasing and decreasing, 
respectively. This immediately follows from writing the 
term occurring in the brackets of Eqs. (|30p as a variance 
in the density p a , 

U:U a -2U' 2 = 2 Vari Pl ~ ™ ) U 2 , (31) 



(32) 



V apo + 0pi 

which is thus positive. 

As a consequence of Eq. ([3"0]) , the relation 

pe' (a) + a6[(a) = Vae [0, 1] 

holds and M' reduces to 

9i(a) 6 {a) 



M'{a) = 



(33) 



The derivatives of the ^-functions do not contribute to 
M' due to the fact that the special form of the two-sided 



estimator (|13p originates from minimizing the asymptotic 
mean square error, cf. [26| . The necessary condition for 
a local minimum of M at a Q , M'(a ) — 0, now reads 



0t = giK) 

al {a o )' 



(34) 



where O — 1 — a is introduced. Using Eqs. (|24|) and 
@, the condition (fM)) results in 



Vari 



1 



1 _|_ e -io+A/+lnr e 



Var 



1 



^ _|_ giu— A/— lnr c 



(35) 



This means, the optimal ratio r a is such that the vari- 
ances of the random functions which are averaged in the 
two-sided estimation (|15l) are equal. However, the exis- 
tence of a solution of M'(a) — is not guaranteed in 
general. 

Writing Eq. (|55|) in the form 

Var/^-^)=Var (^-M (36) 
V apo + ppij \ ap + 0Pi J 

prevents the equation from becoming a tautology. 



IV. CONVEXITY OF THE MEAN SQUARE 
ERROR 

Theorem. The asymptotic mean square error M(a) is 
convex in a. 

In order to prove the convexity, we introduce the op- 
erator T Q (/) which is defined for an arbitrary function 
f(w) by 

T a (/) = Var (/) + a Van(/) - U a Var a (/) . (37) 
Lemma. T a is positive semidefinite, i.e. 

r Q (/)>o v/H. (38) 

For a G (0,1) and f(w) ^ const., the equality holds if 
and only if po = Pi- 

Proof of the Lemma. Let <5/ 7 = f(w) — (/)„, 7 € [0,1]. 
Then 

T Q (/) = / (WoPo + aSf 2 Pl - Sfl PoP \ ! dw 



dw 



' ap + 0p\ 



(ffi/oPo + aSf 2 Pl ) (apo + Pl ) - 6f 2 p oPl 
ap + 0pi 

= af3 f {Shn - Sfopof dw 

J ap + ppi 



+ U a (0(f) o + a(f) 1 -(f) Q ) 2 , (39) 

which is clearly positive. Provided / ^ const, and a ^ 
0, 1, the integrand in the last line is zero Ww if and only 
if po = pi . This completes the proof of the Lemma. □ 
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Proof of the Theorem. Consulting Eqs. (|33| and (|32f . the 
second derivative of M reads 

M (a) = 2 + —)- ^*o(«)- (40) 

Expressing p^ — p — (3d and pi = p + ad in center- and 
relative "coordinates" p = ap + [3pi and d = Pi — Po, 
respectively, gives 



1 ,r /3 



'' w = SI V "'l7j = % Vflr 'U 

„, . , 2a /d 
% (« =7r Va M- 



(41) 



Therefore, ^af3U^,M" = Tq, (=) , which is positive accord- 
ing to the Lemma. □ 

The convexity of the mean square error is a fundamen- 
tal characteristic of Bennett's acceptance ratio method. 
This characteristic allows us to state a simple criterion for 
the existence of a local minimum of the mean square er- 
ror in terms of its derivatives at the boundaries. Namely, 
if 

M'(Q) = Var 1 (e u '- A/ ) - Var (e™- A/ ) (42) 
is negative and 



M'(l) = Var 1 (e- ,i, + A/ ) - Vai^e"™^) 



is positive there exists a local minimum of M(a) for 
a G (0,1). Otherwise, no local minimum exists and 
the global minimum is found on the boundaries of a: 
if M'(0) > 0, the global minimum is found for a = 0, 
thus it is optimal to measure work values in the reverse 
direction only and to use the one-sided reverse estimator 
(fT2"]) . Else, if Af'(l) < 0, the global minimum is found 
for a = 1, implying the one-sided forward estimator pip 
to be optimal. 

In addition, the convexity of the mean square error 
proves the existence and uniqueness of the optimal ratio, 
since a convex function has a global minimum on a closed 
interval. 

Corollary. If a solution of M'{a) = exists, it is unique 
and M(a) attains its global minimum (a G [0, 1]) there. 



still one obstacle to overcome. Yet, all expressions for 
estimating the optimal ratio are based on second mo- 
ments, see e.g. Eq. (f3"5j). Due to convergence issues, it is 
not practicable to base any estimator on expressions that 
involve second moments. The estimator would converge 
far too slowly. For this reason, we transform the problem 
into a form that employs first moments, only. 

Assume we have given no and n\ work values in for- 
ward and reverse direction, respectively, and want to es- 
timate U a , with < a < 1. According to Eq. we can 
estimate the overlap measure U a by using draws from the 
forward direction, 



i 



fib- 



(44) 



where b equals 1 — a and for A/ the best available esti- 
mate of A/ is inserted, i.e. the two-sided estimate based 
on the no + m work values. Similarly, we can estimate 
the overlap measure by using draws from the reverse di- 
rection, 



-E 



1 



711 l^i a + be- w ' +Af 



(45) 



Since in general draws from both directions are available, 
it is reasonable to take an arithmetic mean of both esti- 
mates 



(46) 



(43) where the weighting is chosen such that the better esti- 



mate, or Ua i contributes stronger: with increasing 
a the estimate ujp becomes more reliable, as U a is the 
normalizing constant of the bridging density p a , Eq. ([3]), 

Pi; and conversely for decreasing a. 



and p a - 

From the estimate of the overlap measure we can esti- 
mate the rescaled mean square error by 



Ma = — ^ - 1 
ab \U a 



(47) 



for all a G (0, 1), a result that is entirely based on first 
moments. The infimum of M(a) finally results in an es- 

N ' 



timate a of the optimal choice a of — 



a a M(a )=MM(a). (48) 

a 

When searching for the infimum, we also take 



V. ESTIMATING THE OPTIMAL RATIO WITH 
FIRST MOMENTS 

In situations of practical interest the optimal ratio is 
not available a priori. Thus, we are going to estimate 
the optimal ratio. There exist estimators of the optimal 
ratio since Bennett. In addition we have just proven that 
the optimal ratio exists and is unique. However there is 



, no 

M(o) = ^£X ,, - A / 

fc=l 



711 



M{1) 



— y p-^ (1)+A / 



i=i 



l 

no 



"0 



(49) 



e -w£>+Af 



k=l 



into account which follow from a series expansion of 
Eq. P7|) in a at a — and a = 1, respectively. 
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VI. INCORPORATING COSTS 

The costs of measuring a work value in forward direc- 
tion may differ from the costs of measuring a work value 
in reverse direction. The influence of costs on the optimal 
ratio of sample sizes is investigated here. 

Different costs can be due to a direction dependent 
effort of experimental or computational measurement of 
work (unfolding a RNA may be much easier than folding 
it). We assume the work values to be uncorrelated, which 
is essential for the validity of the theory presented in this 
paper. Thus, a source of noncqual costs, which arises 
especially when work values are obtained via computer 
simulations, is the difference in the strength of correla- 
tions of consecutive Monte-Carlo steps in forward and 
reverse direction. To achieve uncorrelated draws, the 
"correlation-lengths" or "correlation-times" have to be 
determined within the simulation, too. However, this is 
advisable in any case of two-sided estimation, indepen- 
dent of the sampling strategy. 

Let Co and c\ be the costs of drawing a single forward 
and reverse work value, respectively. Our goal is to min- 
imize the mean square error X = j?M while keeping the 
total costs c = n co + n\Cx constant. Keeping c constant 
results in 



N(c,a) 
which in turn yields 

X(c. a) 



ac + (3ci 



(50) 



N{c, a) 



M(a). 



(51) 



If a local minimum exists, it results from J^X(c, a) 
which leads to 

Pi c 6»i(a o ) 



Ci0 (a o ) 



(52) 



a result Bennett was already aware of [26|. However, 
based on second moments, it was not possible to esti- 
mate the optimal ratio r accurately and reliably. Hence, 
Bennett proposed to use a suboptimal equal time strategy 
or equal cost strategy, which spends an equal amount of 
expenses to both directions, i.e. hqCq — n\C\ — § or 



3 

Cl 



(53) 



where a ec = 1 — (3 ec is the equal cost choice for a = ^ . 
This choice is motivated by the following result 



X(c,a)>^X(c,a ec ) Va € [0, 1] 



which states that the asymptotic mean square error of the 
equal cost strategy is at most sub-optimal by a factor of 
2 [26]. Note however that the equal cost strategy can 
be far more sub-optimal if the asymptotic limit of large 
sample sizes is not reached. 



Since we can base the estimator for the optimal ratio 
r on first moments, see Sec. [V] we propose a dynamic 
strategy that performs better than the equal cost strat- 
egy. The infimum of 



vf ^ ac ° + bci mi \ 
X(c,a) = M{a) 



(55) 



results in the estimate a of the optimal choice a Q of ^ , 
a X(c, a a ) = inf X(c, a). (56) 

a 

We remark that opposed to M(a), X(c, a) is not neces- 
sarily convex. However, a global minimum clearly exists 
and can be estimated. 

VII. A DYNAMIC SAMPLING STRATEGY 

Suppose we want to estimate the free-energy difference 
with the acceptance ratio method, but have a limit on the 
total amount of expenses c that can be spend for measure- 
ments of work. In order to maximize the efficiency, the 
measurements are to be performed such that ^ finally 
equals the optimal fraction a of forward measurements. 

The dynamic strategy is as follows: 

1. In absence of preknowledgc on a Q1 we start with 
Bennett's equal cost strategy (f53"|) as an initial guess 
of a . 

2. After drawing a small number of work values we 
make preliminary estimates of the free-energy dif- 
ference, the mean square error, and the optimal 
fraction ct Q . 

3. Depending onwhether the estimated rescaled mean 
square error M(a) is convex, which is a necessary 
condition for convergence, our algorithm updates 
the estimate a of a . 

4. Further work values are drawn such that ^ dynam- 
ically follows S , while a is updated repeatedly. 

There is no need to update a after each individual draw. 
Splitting the total costs into a sequence < v- x > < . . . < 
c^ pS> — c, not necessarily equidistant, we can predefine 
when and how often an update in a a is made. Namely, 
this is done whenever the actually spent costs reach the 
next value of the sequence. 

The dynamic strategy can be cast into an algorithm. 



(54) Algorithm. Set the initial values ~ 



n 



(0) 



ah ' — a ec . In the v-th step of the iteration, v = 1, 
determine 

» - 



= o, 
■ ■ ,p, 

(57) 
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with 



JV(") 



% C + Po Ci 



(58) 



where [ J means rounding to the next lower integer. 

additional forward and 



Then, 



An 



o 

O) 
i 



-O 

to 

O 



0.5 



Auj ' = rz.^ — n\ additional reverse work values are 
drawn. Using the entire present samples, an estimate 

— („) 

Af of A/ is calculated according to Eq. (|T3[) . With 

the free-energy estimate at hand, M^(a) is calculated 
for all values of a G [0, 1] via Eqs. (gU) (gTD and (gSJ), 
discrctized, say in steps Aa = 0.01. If M^\a) is con- 
vex, we update the recent estimate Oo of a Q to 2o y+1 ^ 
via Eqs. (|55|) and (|56| . Otherwise, if M^ u '(a) is not con- 
vex, the corresponding estimate of a is not yet reliable 
and we keep the recent value, 1 







A 




\ 0.5 
\ 




A 




-2 




X 


2 



cxo^ ■ Increasing 



FIG. 2: The main figure displays the exponential work densi- 
ties po (thick line) and pi (thin line) for the choice of fio = 10 
and, according to the fluctuation theorem, fii = 10/11. 
The inset displays the corresponding Boltzmann distributions 
po(x,y) (thick) and pi(x,y) (thin) both for y = 0. Here, u>o 



i ., ,. , ,. .,i T-i rrnn i-i is set equal to 1 arbitrarily, hence u, = (1 + uoWi = 11. The 
v by one, we iteratively continue with Eq. t|57 [) until we H . . ' . v _ r > 2 7 2\ 



— (p) 

finally obtain Af which is the optimal estimate of the 
free-energy difference after having spend all costs c. 
Note that an update in Sio may result in negative 



values of An[^' or An\"' . Should An^' happen to be 



» 



O) 



negative, we set n, 



>-i) 



and 



» 



c n 



ci 



We proceed analogously, if An["' happens to be negative. 

The optimal fraction a depends on the cost ratio 
cx/cq, i.e. the algorithm needs to know the costs Cq and 
c\. However, the costs are not always known in advance 
and may also vary over time. Think of a long time ex- 
periment which is subject to currency changes, inflation, 
terms of trade, innovations, and so on. Of advantage 
is that the dynamic sampling strategy is capable of in- 
corporating varying costs. In each iteration step of the 
algorithm one just inserts the actual costs. If desired, 
the breakpoints may also be adapted to the actual 
costs. Should the costs initially be unknown (e.g. the 
"correlation-length" of a Monte-Carlo simulation needs 
to be determined within the simulation first) one may 
use any reasonable guess until the costs are known. 



(59) 



VIII. AN EXAMPLE 

For illustration of results we choose exponential work 
distributions 



Pi(w) = —e "t , w G £1 = R^ 
Mi 



(60) 



fii > 0, i = 0, 1. According to the fluctuation theorem 
© we have Ml = jf^ and Af = ln(l + p ). 

Exponential work densities arise in a natural way in 
the context of a two-dimensional harmonic oscillator with 



free-energy difference is Af = ln(l + /io) = ln(uJi /uJq) ~ 2.38 . 



Boltzmann distribution p(x,y) — ( x +v )/Z, where 

Z = 2tt/lu 2 is a normalizing constant (partition func- 
tion) and (x,y) G R 2 [28j]. Drawing a point (x,y) from 
the initial density p = p 0l defined by setting lu = u>o, 
and switching the frequency to lo\ > loq instantaneously 
amounts in the work \{<jj\ — 0Jq)(x 2 + y 2 )- The probabil- 
ity density of observing a specific work value w is given 

2 _ 2 

by the exponential density po with u = 1 a . Switch- 
ing the frequency in the reverse direction, u>\ — + ujq, with 
the point (x,y) drawn from p = p\ with u> = u>i, the 
density of work (with interchanged sign) is given by pi 

2 _ 2 

with pi = UJl ul 2 UJ ° — -l+°u ■ The free-energy difference of 
the states characterized by p and p\ is the log-ratio of 
their normalizing constants, Af = — ln ^ = ln(l + po). 
A plot of the work densities for p = 10 is enclosed in 

Fig.rj 

Now, with regard to free-energy estimation, is it better 
to use one- or two-sided estimators? In other words, we 
want to know whether the global minimum of M(a) is 
on the boundaries {0, 1} of a or not. By the convexity of 
M, the answer is determined by the signs of the deriva- 
tives M'(0) and M'(l) at the boundaries. The asymp- 
totic mean square errors (fT5|) and (fTT)]) of the one-sided 
estimators are calculated to be 



M(l) = Var ( 



Mo 



l + 2^o 



for the forward direction and 



M(0) = Vari(e 



w-Af\ _ 



Mo 



1 



2 ' 

Mo 



Mo < 1, 



(61) 



(62) 



for the reverse direction. For p a > 1 the variance of 
the reverse estimator diverges. Note that M(0) > M(l) 
holds for all po > 0, i- e - forward estimation of Af is 



8 



■a 
c 
to 



10 



10" 





M(a) 

II -10 4 , 









0.2 



0.4 0.6 

a 



0.8 



0.9 



0.8 



10 



10 



10' 



20 



10' 



30 



FIG. 3: The overlap function U a and the rescaled asymptotic 
mean square error M for po = 1000. Note that M(a) diverges 
for a — > 0. 



always superior if compared to reverse estimation. Fur- 
thermore, a straightforward calculation gives 



M'(l) 



MoG"o + £-)(Mo -£+) 



(l + 2 Mo ) 2 (l + 3^ ) 
where £± = ^(\/l7 ± 3), and 



M'(0) = - 



Mo (2 + (1 - 2/xqVq) 
(1-^(1-2^) ! 



^ 0< 2' 



(63) 



(64) 



and M'(0) = — oo for po > i. Thus, for the range 
Mo G (0,C+) we have M'(0) < as well as M'(l) < 
and therefore a a — 1, i.e. the forward estimator is supe- 
rior to any two-sided estimator (fT3|) in this range. For 
Mo G (£+>°°) we have M'(0) < and M'(l) > 0, speci- 
fying that a £ (0, 1), i.e. two-sided estimation with an 
appropriate choice of a is optimal. 

Numerical calculation of the function U a and subse- 
quent evaluation of M(a) allows to find the "exact" op- 
timal fraction a . Examples for U a and M are plotted 
in Fig. |3 

The behavior of a as a function of po is quite interest- 
ing, see Fig. UJ We can interpret this behavior in terms 
of the Boltzmann distributions as follows. Without loss 
of generality, assume loq = 1 is fixed. Increasing po then 
means increasing u>\. The density p\ is fully nested in po, 
cf. the inset of Fig. [2] (remember that u>i > u>q) and con- 
verges to a delta-peak at the origin with increasing u>%. 
This means that by sampling from po we can obtain in- 
formation about the full density p\ quite easily, whereas 
sampling from p\ provides only poor information about 
Po . This explains why a = 1 holds for small values of po . 
However, with increasing oji the density p\ becomes so 
narrow that it becomes difficult to obtain draws from pa 
that fall into the main part of p\. Therefore, it is better to 
add some information from pi, hence, a decreases. In- 
creasing u>\ further, the relative number of draws needed 
from p\ will decrease, as the density converges towards 
the delta distribution. Finally, it will become sufficient 
to make only one draw from p\ in order to obtain the full 



FIG. 4: The optimal fraction a = 7^ of forward work val- 
ues for the two-sided estimation in dependence of the average 
forward work po- For po < £+ ~ 3.56 the one-sided forward 
estimator is optimal, i.e. a — 1. 
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FIG. 5: (Color online) Example of a single run using the dy- 
namic strategy: the optimal fraction a of forward measure- 
ments for the two-sided free-energy estimation is estimated 
at predetermined values of total sample sizes N = no + ni of 
forward and reverse work values. Subsequently, taking into 
account the current actual fraction a = additional work 
values are drawn such that we come closer to the estimated 



information available. Therefore, a converges towards 1 
in the limit /io — > 00. 

In the following the dynamic strategy proposed in 
Sec. IVIII is applied. We choose po = 1000 and c = c\. 
The equal cost strategy draws according to a ec = 0.5 
which is used as initial value in the dynamic strategy. 
The results of a single run are presented in Figs. EH7] 
Starting with N = 100, the estimate of a is updated 
in steps of AiV = 100. The actual forward fractions a 
together with the estimated values of the optimal frac- 
tion a are shown in Fig. [5] The first three estimates of 
a a are rejected, because the estimated function M(a) is 
not yet convex. Therefore, a remains unchanged at the 
beginning. Afterwards, a follows the estimates of a and 
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FIG. 6: (Color online) Displayed are estimated mean square 
errors M in dependence of a for different sample sizes. The 
global minimum of the estimated function M determines the 
estimate of the optimal fraction a of forward work measure- 
ments. 



— — — Af exact 




1000 2000 3000 4000 

N 

FIG. 7: (Color online) Comparison of a single run of free- 
energy estimation using the equal cost strategy versus a single 
run using the dynamic strategy. The errorbars are the square 
roots of the estimated mean square error X. 

starts to fluctuate about the "exact" value of a . Some 
estimates of the function M corresponding to this run are 
depicted in Fig. [51 For these estimates a is discretized 
in steps Aa = 0.01 . Remarkably, the estimates of a a 
that result from these curves are quite accurate even for 
relatively small N. Finally, Fig. [7] shows the free-energy 
estimates of the run (not for all values of N) , compared 
with those of a single run where the equal cost strategy 
is used. We find some increase of accuracy when using 
the dynamic strategy. 

In combination with a good a priori choice of the ini- 
tial value of a, the use of the dynamic strategy enables 
a superior convergence and precision of free-energy esti- 
mation, see Figs. [5] and [5] Due to insight into some par- 
ticular system under consideration, it is not unusual that 



FIG. 8: (Color online) Averaged estimates from 10 000 in- 
dependent runs with dynamic strategy versus 10 000 runs 
with equal cost strategy in dependence of the total cost 
c = rioco + n\C\ spend. The cost ratio is ci/co = 0.01, 
Co + ci = 2, and fio = 1000. The errorbars represent one 
standard deviation. Here, the initial value of a in the dy- 
namic strategy is 0.5, while the equal cost strategy draws 
with a ec « 0.01. We note that a » 0.08. 




FIG. 9: (Color online) Displayed are mean square errors of 
free-energy estimates using the same data as in Fig. [8] In 
addition, the mean square errors of estimates with constant 
a = a are included, as well as the asymptotic behavior, 
Eq. fl 5 1 p . The inset shows that the mean square error of 
the dynamic strategy approaches the asymptotic optimum, 
whereas the equal cost strategy is suboptimal. Note that for 
small sample sizes the asymptotic behavior does not represent 
the actual mean square error. 



one has a priori knowledge which results in a better guess 
for the initial choice of a in the dynamic strategy than 
starting with a — a ec . For instance, a good initial choice 
is known when estimating the chemical potential via 
Widom's particle insertion and deletion [3l|. Namely, it 
is a priori clear that inserting particles yields much more 
information then deleting particles, since the phase-space 
which is accessible to particles in the "deletion-system" is 
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effectively contained in the phase-space accessible to the 
particles in the "insertion-system" , cf. e.g. [To] ]. A good a 
priori initial choice for a may be a = 0.9 with which the 
dynamic strategy outperforms any other strategy that 
the authors are aware of. 

Once reaching the limit of large sample sizes, the dy- 
namic strategy is insensitive to the initial choice of a, 
since the strategy is robust and finds the optimal frac- 
tion a of forward measurements itself. 



IX. CONCLUSION 

Two-sided free-energy estimation, i.e. the acceptance 
ratio method [26j ]. employs samples of iiq forward and 
n\ reverse work measurements in the determination of 
free-energy differences in a statistically optimal manner. 
However, its statistical properties depend strongly on the 
ratio — of work values used. As a central result we have 

n 

proven the convexity of the asymptotic mean square er- 
ror of two-sided free-energy estimation as a function of 
the fraction a — ^ of forward work values used. From 
here follows immediately the existence and uniqueness of 
the optimal fraction a which minimizes the asymptotic 
mean square error. This is of particular interest if we 
can control the value of a, i.e. can make additional mea- 
surements of work in either direction. Drawing such that 
we finally reach ^ = a Q , the efficiency of two-sided esti- 
mation can be enhanced considerably. Consequently, we 
have developed a dynamic sampling strategy which itcr- 
atively estimates a and makes additional draws or mea- 
surements of work. Thereby, the convexity of the mean 
square error enters as a key criterion for the reliability 
of the estimates. For a simple example which allows to 
compare with analytic calculations, the dynamic strategy 
has shown to work perfectly. 

In the asymptotic limit of large sample sizes the dy- 



namic strategy is optimal and outperforms any other 
strategy. Nevertheless, in this limit it has to compete 
with the near optimal equal cost strategy of Bennett 
which also performs very good. It is worth mentioning 
that even if the latter comes close to the performance 
of ours, it is worthwhile the effort of using the dynamic 
strategy, since the underlying algorithm can be easily im- 
plemented and does cost quite anything if compared to 
the effort required for drawing additional work values. 

Most important for experimental and numerical esti- 
mation of free-energy differences is the range of small 
and moderate sample sizes. For this relevant range, it 
is found that the dynamic strategy performs very good, 
too. It converges significantly better than the equal cost 
strategy. In particular, for small and moderate sample 
sizes it can improve the accuracy of free-energy estimates 
by half an order of magnitude. 

We close our considerations by mentioning that the 
two-sided estimator is typically far superior with respect 
to one-sided estimators: assume the support and po and 
Pi is symmetric about A/ [32| : then, if the densities are 
symmetric to each other, po(Af +w) = Pi(Af — w), the 
optimal fraction of forward draws is ^ = h by sym- 
metry. Therefore, if the symmetry is violated not too 
strongly, the optimum will remain near 0.5 . Continuous 
deformations of the densities change the optimal fraction 
a a continuously. Thus, a does not reach and 1, re- 
spectively, for some certain strength of asymmetry. It is 
exceptionally hard to violate the symmetry such that a 
hits the boundary or 1. In consequence, in almost all 
situations, the two-sided estimator is superior. 
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